Cauchy distribution#
The Cauchy distribution is the canonical example of a heavy-tailed continuous distribution where the usual intuition from the Central Limit Theorem breaks: the mean and variance do not exist.
We’ll treat it as a first-class modeling object (not just a pathological counterexample): it appears in ratio statistics, robust error models, and as the Student-t distribution with 1 degree of freedom.
Learning goals#
Know the definition (PDF, CDF) and how it relates to Student-t.
Understand why moments diverge and what statistics are still well-behaved (median, quantiles).
Implement sampling from scratch with NumPy (inverse CDF).
Use
scipy.stats.cauchyfor evaluation, simulation, and MLE fitting.
import platform
import numpy as np
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots
import scipy
from scipy import optimize
from scipy.stats import cauchy, norm
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(7)
print("Python", platform.python_version())
print("NumPy", np.__version__)
print("SciPy", scipy.__version__)
Python 3.12.9
NumPy 1.26.2
SciPy 1.15.0
1) Title & classification#
Name:
cauchyType: continuous distribution
Support: \(x \in (-\infty, \infty)\)
Parameter space: location \(x_0 \in \mathbb{R}\) and scale \(\gamma > 0\)
We write:
The standard Cauchy is \(\mathrm{Cauchy}(0,1)\).
2) Intuition & motivation#
What it models#
The Cauchy distribution models real-valued outcomes with extremely heavy tails: large deviations are not just possible, they are common enough that averaging does not settle down.
A useful tail heuristic:
Normal tails decay like \(\exp(-x^2)\)
Cauchy tails decay like \(1/x^2\) (so the survival function decays like \(1/x\))
This makes Cauchy a good model for occasional, very large values.
Typical real-world use cases#
Ratio statistics: if \(Z_1, Z_2 \sim \mathcal{N}(0,1)\) i.i.d., then \(Z_1/Z_2\) is standard Cauchy.
Robust error models: using a Cauchy likelihood downweights outliers even more aggressively than a Student-t with larger degrees of freedom.
Weakly-informative priors: the (half-)Cauchy is a popular prior for scale parameters in Bayesian hierarchical models.
Relations to other distributions#
Student-t: \(\mathrm{Cauchy}(0,1)\) is exactly Student-\(t\) with \(\nu=1\) degree of freedom.
Stable distributions: Cauchy is stable with stability parameter \(\alpha=1\).
Uniform → Cauchy: if \(U\sim \mathrm{Unif}(0,1)\) then \(\tan\bigl(\pi(U-\tfrac{1}{2})\bigr)\) is standard Cauchy.
3) Formal definition#
PDF#
For \(X \sim \mathrm{Cauchy}(x_0, \gamma)\):
CDF#
Quantile function (inverse CDF)#
We’ll implement these directly (NumPy-only) and cross-check against SciPy.
def cauchy_pdf(x: np.ndarray, x0: float = 0.0, gamma: float = 1.0) -> np.ndarray:
x = np.asarray(x, dtype=float)
if gamma <= 0:
raise ValueError("gamma must be > 0")
z = (x - x0) / gamma
return 1.0 / (np.pi * gamma * (1.0 + z * z))
def cauchy_logpdf(x: np.ndarray, x0: float = 0.0, gamma: float = 1.0) -> np.ndarray:
x = np.asarray(x, dtype=float)
if gamma <= 0:
raise ValueError("gamma must be > 0")
z = (x - x0) / gamma
return -np.log(np.pi * gamma) - np.log1p(z * z)
def cauchy_cdf(x: np.ndarray, x0: float = 0.0, gamma: float = 1.0) -> np.ndarray:
x = np.asarray(x, dtype=float)
if gamma <= 0:
raise ValueError("gamma must be > 0")
return 0.5 + np.arctan((x - x0) / gamma) / np.pi
def cauchy_ppf(p: np.ndarray, x0: float = 0.0, gamma: float = 1.0) -> np.ndarray:
p = np.asarray(p, dtype=float)
if gamma <= 0:
raise ValueError("gamma must be > 0")
if np.any((p <= 0) | (p >= 1)):
raise ValueError("p must be in (0, 1)")
return x0 + gamma * np.tan(np.pi * (p - 0.5))
# Quick cross-check with SciPy
x = np.linspace(-5, 5, 9)
print("max |pdf - scipy|:", np.max(np.abs(cauchy_pdf(x) - cauchy.pdf(x))))
print("max |cdf - scipy|:", np.max(np.abs(cauchy_cdf(x) - cauchy.cdf(x))))
max |pdf - scipy|: 1.3877787807814457e-17
max |cdf - scipy|: 1.1102230246251565e-16
4) Moments & properties#
Mean, variance, skewness, kurtosis#
For a Cauchy distribution:
Mean: undefined (does not exist as a finite expectation)
Variance: undefined / infinite
Skewness: undefined
Kurtosis: undefined
Even though the PDF is symmetric (when centered), the expectation fails because the integral is not absolutely integrable.
What does exist and is often useful:
Median: \(x_0\)
Mode: \(x_0\)
Quantiles: \(Q(p)=x_0+\gamma\tan(\pi(p-1/2))\)
IQR: \(Q(0.75)-Q(0.25)=2\gamma\) (so \(\gamma = \tfrac{\mathrm{IQR}}{2}\))
MGF and characteristic function#
MGF \(M_X(t)=\mathbb{E}[e^{tX}]\) does not exist for any nonzero \(t\).
The characteristic function does exist:
Entropy#
The (differential) entropy is finite:
# Visualizing the characteristic function
# φ(t) = exp(i x0 t - γ|t|)
x0, gamma = 0.0, 1.0
t = np.linspace(-12, 12, 2000)
phi = np.exp(1j * x0 * t - gamma * np.abs(t))
fig = make_subplots(rows=1, cols=2, subplot_titles=("Re φ(t)", "Im φ(t)"))
fig.add_trace(go.Scatter(x=t, y=np.real(phi), mode="lines", name="Re"), row=1, col=1)
fig.add_trace(go.Scatter(x=t, y=np.imag(phi), mode="lines", name="Im"), row=1, col=2)
fig.update_xaxes(title_text="t", row=1, col=1)
fig.update_xaxes(title_text="t", row=1, col=2)
fig.update_layout(width=950, height=350, showlegend=False)
fig.show()
5) Parameter interpretation#
\(x_0\) (location): shifts the distribution left/right. It is the median and mode.
\(\gamma\) (scale): stretches the distribution. It is the half-width at half-maximum (HWHM):
A practical, robust interpretation:
\(Q(0.75)=x_0+\gamma\) and \(Q(0.25)=x_0-\gamma\) so \(\gamma = \tfrac{\mathrm{IQR}}{2}\).
# Shape changes: PDF and CDF for different (x0, γ)
x = np.linspace(-10, 10, 2000)
params = [
(0.0, 0.5),
(0.0, 1.0),
(0.0, 2.0),
(2.0, 1.0),
]
fig = make_subplots(rows=1, cols=2, subplot_titles=("PDF", "CDF"))
for x0, gamma in params:
fig.add_trace(
go.Scatter(x=x, y=cauchy_pdf(x, x0=x0, gamma=gamma), mode="lines", name=f"x0={x0}, γ={gamma}"),
row=1,
col=1,
)
fig.add_trace(
go.Scatter(x=x, y=cauchy_cdf(x, x0=x0, gamma=gamma), mode="lines", name=f"x0={x0}, γ={gamma}"),
row=1,
col=2,
)
fig.update_xaxes(title_text="x", row=1, col=1)
fig.update_yaxes(title_text="f(x)", row=1, col=1)
fig.update_xaxes(title_text="x", row=1, col=2)
fig.update_yaxes(title_text="F(x)", row=1, col=2)
fig.update_layout(width=1000, height=420)
fig.show()
6) Derivations#
Expectation (why the mean does not exist)#
For the standard Cauchy \(f(x)=\frac{1}{\pi(1+x^2)}\):
The integrand is odd, and the principal value integral is 0, but the expectation is defined via absolute integrability:
However, for large \(x\):
and \(\int^\infty \frac{1}{x}\,dx\) diverges. Therefore the mean is undefined.
A useful explicit calculation for the truncated absolute moment:
Variance (why it does not exist)#
Similarly,
but for large \(x\), \(x^2 f(x) \sim \frac{1}{\pi}\), so the integral diverges linearly.
Likelihood#
Given i.i.d. data \(x_1,\dots,x_n\) from \(\mathrm{Cauchy}(x_0,\gamma)\), the log-likelihood is:
The score equations (set derivatives to 0) have no simple closed form; MLE is typically found numerically.
# Truncated absolute moment grows ~ log A (so the mean cannot exist)
A = np.logspace(0, 4, 30) # 1 ... 10^4
trunc_abs_moment = (1.0 / np.pi) * np.log1p(A * A) # exact for standard Cauchy
fig = go.Figure()
fig.add_trace(go.Scatter(x=A, y=trunc_abs_moment, mode="lines+markers"))
fig.update_xaxes(title_text="A", type="log")
fig.update_yaxes(title_text=r"∫_{-A}^{A} |x| f(x) dx")
fig.update_layout(title="Truncated E[|X|] diverges (log growth)", width=850, height=420)
fig.show()
def cauchy_loglik(x: np.ndarray, x0: float, gamma: float) -> float:
return float(np.sum(cauchy_logpdf(x, x0=x0, gamma=gamma)))
def cauchy_mle_scipy(x: np.ndarray) -> tuple[float, float]:
"""MLE via SciPy optimizer on (x0, log_gamma)."""
x = np.asarray(x, dtype=float)
def nll(theta: np.ndarray) -> float:
x0, log_gamma = float(theta[0]), float(theta[1])
gamma = float(np.exp(log_gamma))
return -cauchy_loglik(x, x0=x0, gamma=gamma)
# Robust start: median and IQR/2 (exact relationship for Cauchy)
x0_init = float(np.median(x))
q25, q75 = np.quantile(x, [0.25, 0.75])
gamma_init = float(max((q75 - q25) / 2.0, 1e-6))
res = optimize.minimize(
nll,
x0=np.array([x0_init, np.log(gamma_init)]),
method="Nelder-Mead",
options={"maxiter": 5000},
)
x0_hat, log_gamma_hat = res.x
return float(x0_hat), float(np.exp(log_gamma_hat))
# Demonstrate likelihood estimation on a sample
true_x0, true_gamma = 0.0, 1.0
x_sample = cauchy.rvs(loc=true_x0, scale=true_gamma, size=200, random_state=rng)
x0_hat, gamma_hat = cauchy_mle_scipy(x_sample)
print("true (x0, γ) =", (true_x0, true_gamma))
print("MLE (x0, γ) =", (x0_hat, gamma_hat))
# Compare to robust quantile-based estimator
x0_med = float(np.median(x_sample))
q25, q75 = np.quantile(x_sample, [0.25, 0.75])
gamma_iqr = float((q75 - q25) / 2.0)
print("median/IQR (x0, γ) =", (x0_med, gamma_iqr))
true (x0, γ) = (0.0, 1.0)
MLE (x0, γ) = (0.023594778811368573, 1.001232560754852)
median/IQR (x0, γ) = (0.04780667370458491, 1.0239694365750378)
7) Sampling & simulation (NumPy-only)#
Inverse CDF method#
Because we have a closed-form inverse CDF, sampling is straightforward:
Draw \(U \sim \mathrm{Unif}(0,1)\)
Return \(X = F^{-1}(U) = x_0 + \gamma\tan\bigl(\pi(U-1/2)\bigr)\)
This works because \(F(X)\) is uniform for any continuous distribution.
Ratio-of-normals method (alternative)#
If \(Z_1, Z_2 \sim \mathcal{N}(0,1)\) i.i.d., then \(Z_1/Z_2\) is standard Cauchy. This provides another simple sampler.
We’ll implement both with NumPy and verify they agree.
def sample_cauchy_inverse_cdf(
rng: np.random.Generator,
size: int,
x0: float = 0.0,
gamma: float = 1.0,
eps: float = 1e-12,
) -> np.ndarray:
if gamma <= 0:
raise ValueError("gamma must be > 0")
u = rng.random(size)
u = np.clip(u, eps, 1.0 - eps) # avoid tan(±π/2)
return x0 + gamma * np.tan(np.pi * (u - 0.5))
def sample_cauchy_ratio_normals(
rng: np.random.Generator,
size: int,
x0: float = 0.0,
gamma: float = 1.0,
) -> np.ndarray:
if gamma <= 0:
raise ValueError("gamma must be > 0")
z1 = rng.standard_normal(size)
z2 = rng.standard_normal(size)
return x0 + gamma * (z1 / z2)
# Quick sampler comparison
n = 200_000
x_inv = sample_cauchy_inverse_cdf(rng, n)
x_rat = sample_cauchy_ratio_normals(rng, n)
# Compare a few robust summaries (means are meaningless here)
for name, x in [("inverse", x_inv), ("ratio", x_rat)]:
med = float(np.median(x))
q25, q75 = np.quantile(x, [0.25, 0.75])
print(name, "median", med, "IQR/2", (q75 - q25) / 2.0)
inverse median 0.003911078464110154 IQR/2 0.9991072407476334
ratio median 0.0014358686297880883 IQR/2 1.0013452766464317
8) Visualization#
We’ll visualize:
the PDF and CDF
a histogram of Monte Carlo samples with PDF overlay
why the running mean fails to stabilize for Cauchy samples
# PDF and CDF (standard Cauchy)
x = np.linspace(-10, 10, 4000)
fig = make_subplots(rows=1, cols=2, subplot_titles=("PDF", "CDF"))
fig.add_trace(go.Scatter(x=x, y=cauchy_pdf(x), mode="lines", name="pdf"), row=1, col=1)
fig.add_trace(go.Scatter(x=x, y=cauchy_cdf(x), mode="lines", name="cdf"), row=1, col=2)
fig.update_xaxes(title_text="x", row=1, col=1)
fig.update_yaxes(title_text="f(x)", row=1, col=1)
fig.update_xaxes(title_text="x", row=1, col=2)
fig.update_yaxes(title_text="F(x)", row=1, col=2)
fig.update_layout(width=950, height=380, showlegend=False)
fig.show()
# Monte Carlo samples: histogram + PDF overlay
n = 200_000
x_samp = sample_cauchy_inverse_cdf(rng, n)
# Clip for visualization only (Cauchy produces extreme values)
clip = 25
x_vis = x_samp[np.abs(x_samp) <= clip]
xbins = np.linspace(-clip, clip, 120)
fig = go.Figure()
fig.add_trace(
go.Histogram(
x=x_vis,
xbins=dict(start=-clip, end=clip, size=xbins[1] - xbins[0]),
histnorm="probability density",
name="samples",
)
)
xgrid = np.linspace(-clip, clip, 2000)
fig.add_trace(go.Scatter(x=xgrid, y=cauchy_pdf(xgrid), mode="lines", name="true pdf", line=dict(width=3)))
fig.update_layout(
title=f"Cauchy samples (n={n:,}) — histogram clipped to |x|≤{clip} for readability",
xaxis_title="x",
yaxis_title="density",
width=950,
height=450,
)
fig.show()
print("fraction clipped:", 1.0 - (len(x_vis) / len(x_samp)))
fraction clipped: 0.025394999999999945
# Running mean vs running median: mean doesn't stabilize
n = 50_000
x = sample_cauchy_inverse_cdf(rng, n)
running_mean = np.cumsum(x) / (np.arange(n) + 1)
# Running median (O(n^2) naive) is expensive; approximate using block medians.
block = 200
m = n // block
block_medians = np.array([np.median(x[i * block : (i + 1) * block]) for i in range(m)])
running_block_median = np.cumsum(block_medians) / (np.arange(m) + 1)
fig = make_subplots(
rows=2,
cols=1,
shared_xaxes=False,
vertical_spacing=0.12,
subplot_titles=("Running mean (highly unstable)", f"Average of block medians (block={block})"),
)
fig.add_trace(go.Scatter(x=np.arange(n), y=running_mean, mode="lines", line=dict(width=1)), row=1, col=1)
fig.update_yaxes(title_text="mean", row=1, col=1)
fig.add_trace(go.Scatter(x=np.arange(m) * block, y=running_block_median, mode="lines", line=dict(width=2)), row=2, col=1)
fig.update_xaxes(title_text="sample index", row=2, col=1)
fig.update_yaxes(title_text="avg median", row=2, col=1)
fig.update_layout(width=950, height=650, showlegend=False)
fig.show()
9) SciPy integration (scipy.stats.cauchy)#
SciPy parameterization matches the usual location/scale form:
loc= \(x_0\)scale= \(\gamma\)
Key methods:
cauchy.pdf(x, loc, scale)cauchy.cdf(x, loc, scale)cauchy.rvs(loc, scale, size, random_state)cauchy.fit(data)(MLE)
# Basic SciPy usage
x0, gamma = 1.5, 0.8
x = np.array([-2.0, 0.0, 2.0])
print("pdf:", cauchy.pdf(x, loc=x0, scale=gamma))
print("cdf:", cauchy.cdf(x, loc=x0, scale=gamma))
samples = cauchy.rvs(loc=x0, scale=gamma, size=5, random_state=rng)
print("rvs:", samples)
# Fit (MLE) from samples
n = 5_000
data = cauchy.rvs(loc=x0, scale=gamma, size=n, random_state=rng)
loc_hat, scale_hat = cauchy.fit(data) # returns (loc, scale)
# Robust quantile-based estimate (exact IQR relationship)
loc_med = float(np.median(data))
q25, q75 = np.quantile(data, [0.25, 0.75])
scale_iqr = float((q75 - q25) / 2.0)
print("true (loc, scale) =", (x0, gamma))
print("fit (loc, scale) =", (loc_hat, scale_hat))
print("IQR (loc, scale) =", (loc_med, scale_iqr))
pdf: [0.0198 0.0881 0.2861]
cdf: [0.0715 0.156 0.6778]
rvs: [0.0955 0.4031 6.9146 1.3602 2.5764]
true (loc, scale) = (1.5, 0.8)
fit (loc, scale) = (1.4971939014159894, 0.7836337981785467)
IQR (loc, scale) = (1.4846293035097757, 0.7825255275576956)
10) Statistical use cases#
Hypothesis testing (location)#
Because the mean is undefined, tests based on the sample mean are not appropriate.
A simple alternative is to test the location using the sample median. For known \(\gamma\), the sample median is asymptotically normal:
So an approximate z-test for \(H_0: x_0=x_{0,0}\) is:
Bayesian modeling#
Cauchy likelihood: a robust alternative to Gaussian noise (strong outlier downweighting).
(Half-)Cauchy priors: common weakly-informative priors for scales (e.g., hierarchical standard deviations).
Generative modeling#
As a heavy-tailed component/noise distribution.
As a stable distribution: sums of independent Cauchy variables remain Cauchy (up to parameter updates).
# Location test via the sample median (known γ)
def median_z_test_cauchy_location(x: np.ndarray, x0_null: float, gamma: float) -> tuple[float, float]:
"""Approximate two-sided z-test for the location using the sample median.
Returns (z, p_value) using asymptotic normality of the median.
"""
x = np.asarray(x, dtype=float)
if gamma <= 0:
raise ValueError("gamma must be > 0")
n = x.size
med = float(np.median(x))
se = (np.pi * gamma) / (2.0 * np.sqrt(n))
z = (med - x0_null) / se
# two-sided p-value under N(0,1)
p = 2.0 * (1.0 - norm.cdf(abs(z)))
return float(z), float(p)
# Simulate data under H0 and H1
n = 301
true_x0, gamma = 0.0, 1.0
x_h0 = cauchy.rvs(loc=true_x0, scale=gamma, size=n, random_state=rng)
print("H0 example:", median_z_test_cauchy_location(x_h0, x0_null=0.0, gamma=gamma))
x_h1 = cauchy.rvs(loc=0.7, scale=gamma, size=n, random_state=rng)
print("H1 example:", median_z_test_cauchy_location(x_h1, x0_null=0.0, gamma=gamma))
H0 example: (2.275109825687059, 0.022899342386556443)
H1 example: (6.409909935899163, 1.4560552763498436e-10)
# A simple Bayesian example: posterior over x0 with known γ (grid approximation)
# Model: x_i ~ Cauchy(x0, γ), prior: x0 ~ Cauchy(0, τ)
tau = 2.0
true_x0, gamma = 1.0, 1.0
x = cauchy.rvs(loc=true_x0, scale=gamma, size=50, random_state=rng)
# Grid over x0
grid = np.linspace(-8, 8, 4001)
dx = grid[1] - grid[0]
log_prior = cauchy_logpdf(grid, x0=0.0, gamma=tau)
log_like = np.sum(cauchy_logpdf(x[:, None], x0=grid[None, :], gamma=gamma), axis=0)
log_post_unnorm = log_prior + log_like
# Stabilize and normalize
log_post_unnorm -= np.max(log_post_unnorm)
post = np.exp(log_post_unnorm)
post /= np.trapz(post, grid)
post_cdf = np.cumsum(post) * dx
post_cdf /= post_cdf[-1]
# Posterior summaries
x0_map = float(grid[np.argmax(post)])
# posterior median
x0_med = float(np.interp(0.5, post_cdf, grid))
fig = go.Figure()
fig.add_trace(go.Scatter(x=grid, y=post, mode="lines", name="posterior density"))
fig.add_vline(x=true_x0, line=dict(dash="dash"), annotation_text="true x0")
fig.add_vline(x=x0_map, line=dict(dash="dot"), annotation_text="MAP")
fig.add_vline(x=x0_med, line=dict(dash="dot"), annotation_text="post median")
fig.update_layout(
title="Posterior over location x0 (Cauchy likelihood, Cauchy prior; γ known)",
xaxis_title="x0",
yaxis_title="density",
width=950,
height=420,
)
fig.show()
print("true x0:", true_x0)
print("MAP:", x0_map)
print("posterior median:", x0_med)
true x0: 1.0
MAP: 0.9440000000000008
posterior median: 0.9388992434096356
# Generative property: stability under addition
# If X ~ Cauchy(x0, γ) and Y ~ Cauchy(y0, δ) independent,
# then X+Y ~ Cauchy(x0+y0, γ+δ).
n = 200_000
x0, gamma = 0.0, 1.0
y0, delta = 0.0, 2.0
x = sample_cauchy_inverse_cdf(rng, n, x0=x0, gamma=gamma)
y = sample_cauchy_inverse_cdf(rng, n, x0=y0, gamma=delta)
z = x + y
# Compare robust estimates (median and IQR/2)
med_z = float(np.median(z))
q25, q75 = np.quantile(z, [0.25, 0.75])
scale_z = float((q75 - q25) / 2.0)
print("theory median:", x0 + y0)
print("empirical median:", med_z)
print("theory scale:", gamma + delta)
print("empirical scale (IQR/2):", scale_z)
theory median: 0.0
empirical median: -0.007899555278458287
theory scale: 3.0
empirical scale (IQR/2): 2.995657428390161
11) Pitfalls#
Invalid parameters: the scale must satisfy \(\gamma>0\).
Mean/variance-based methods break: sample mean is not consistent and does not stabilize; moment-matching and CLT-based standard errors do not apply.
Extreme values are normal: Monte Carlo histograms often need clipping for visualization.
Numerical issues in sampling: inverse-CDF uses
tan(·), which explodes near \(\pm\pi/2\); clip uniform draws away from 0 and 1.Likelihood optimization: the log-likelihood can be relatively flat with multiple local optima for small samples; use robust initialization (median, IQR/2).
Use
logpdffor products: working in log-space avoids underflow when multiplying many densities.
12) Summary#
cauchyis a continuous, heavy-tailed distribution on \(\mathbb{R}\) with parameters \((x_0,\gamma)\).The PDF/CDF/PPF have simple closed forms; sampling via inverse CDF is easy.
Mean and variance do not exist; prefer median/quantiles/IQR for summaries.
It is Student-\(t\) with \(\nu=1\) and is stable under addition.
In practice it is useful for robust modeling and as a (half-)Cauchy prior for scale parameters.
References:
SciPy docs:
scipy.stats.cauchyJohnson, Kotz, Balakrishnan — Continuous Univariate Distributions